Washington State BDC Challenge Analysis¶

This will be a quick analysis of data from the BDC for the State of Washington based on data released from the FCC performed by Nick Pappin of WSU Extension.

All of this data was downloaded from here and will be included in the repository associated with this analysis. We will work in reverse cronological order.

In [1]:
# Housekeeping stuff.
!python -m pip install --upgrade pip
!pip install --upgrade -r requirements.txt > output.txt
import zipfile,  pathlib, itables
import pandas as pd
import numpy as np
import geopandas as gpd
import contextily as cx
import matplotlib.pyplot as plt
from pprint import pprint
Requirement already satisfied: pip in c:\users\nick.pappin\projects\wafabricchallengereview-20230808\venv\lib\site-packages (23.2.1)

Load Data¶

In [2]:
dfColumnHints = {
    "location_id": "Int64",
    "zip_code": "Int64"
}
jul23zip = zipfile.ZipFile("Washington - Fabric Challenge - Resolved - jul23.zip")
jul23 = pd.read_csv(jul23zip.open(jul23zip.filelist[0].filename), dtype=dfColumnHints)
jul23zip.close()
mar23zip = zipfile.ZipFile("Washington - Fabric Challenge - Resolved - mar23.zip")
mar23 = pd.read_csv(mar23zip.open(mar23zip.filelist[0].filename), dtype=dfColumnHints)
mar23zip.close()
dec22zip = zipfile.ZipFile("Washington - Fabric Challenge - Resolved - dec22.zip")
dec22 = pd.read_csv(dec22zip.open(dec22zip.filelist[0].filename), dtype=dfColumnHints)
dec22zip.close()
nov22zip = zipfile.ZipFile("Washington - Fabric Challenge - Resolved - nov22.zip")
nov22 = pd.read_csv(nov22zip.open(nov22zip.filelist[0].filename), dtype=dfColumnHints)
nov22zip.close()
C:\Users\nick.pappin\AppData\Local\Temp\ipykernel_52716\1898070276.py:6: DtypeWarning: Columns (10,13) have mixed types. Specify dtype option on import or set low_memory=False.
  jul23 = pd.read_csv(jul23zip.open(jul23zip.filelist[0].filename), dtype=dfColumnHints)

July - 2023¶

Let's start looking at the number of challenges that were successful and how many were overturned. The full data table will be at the end and the original CSV is in the corresponding zip file.

In [3]:
jul23.adjudication_status.value_counts(dropna=False)
Out[3]:
adjudication_status
Overturned    159956
Upheld         21867
Name: count, dtype: int64

And let's look at it with the category code.

In [4]:
jul23.pivot_table('challenge_id', index='category_code_desc', columns='adjudication_status', aggfunc="count")
Out[4]:
adjudication_status Overturned Upheld
category_code_desc
Add additional address for the location 1 7
Incorrect Location Address 28 403
Incorrect Location Building Type 4432 13604
Incorrect Location Unit Count 2 18
Location is Not Broadband Serviceable 1956 617
Location is Not Within Correct Building Footprint 89 79
Missing Broadband Serviceable Location 153448 7139

Maps¶

First we are going to pull out the locations with no geographic information. This removed about 20K locations.

In [5]:
jul23geo = gpd.GeoDataFrame(jul23[jul23.latitude.notna() & jul23.longitude.notna()])
jul23geo = jul23geo.set_geometry(gpd.points_from_xy(jul23geo.longitude, jul23geo.latitude), crs='EPSG:4326')
jul23geo = jul23geo.to_crs(3857)
droppedChallenges = jul23.shape[0] - jul23geo.shape[0]
print(f'Before dropping challenges without coordinates: {jul23.shape[0]}')
print(f'After dropping challenges without coordinates: {jul23geo.shape[0]}')
print(f'Challenges dropped: {droppedChallenges}')
Before dropping challenges without coordinates: 181823
After dropping challenges without coordinates: 160755
Challenges dropped: 21068

With the no data challenges removed we get this map.

In [6]:
ax = jul23geo.plot(marker='.', markersize = 1, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=jul23geo.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

Let's look at the rejected challenges first.

In [7]:
rejected = jul23geo[jul23geo.adjudication_status == 'Overturned']
ax = rejected.plot(marker='.', markersize = 1, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=rejected.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

Not really much to see here aside from we challenged all over the state.

In [8]:
itables.show(rejected)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id geometry
Loading... (need help?)
In [9]:
itables.show(rejected.adjudication_code_desc.value_counts())
count
adjudication_code_desc
Loading... (need help?)

Unfortunatly with this dataset there is no real way to see or intuit failed challenges by FRN/Organization.

Here are the accepted challenges.

In [10]:
upheld = jul23geo[jul23geo.adjudication_status == 'Upheld']
ax = upheld.plot(marker='.', markersize = 1, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=upheld.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

Interesting to see where there were really successful challenge attempts this go around. And here is an interactive version.

In [11]:
upheld.explore()
Out[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [12]:
itables.show(upheld)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id geometry
Loading... (need help?)
In [13]:
itables.show(upheld.adjudication_code_desc.value_counts())
count
adjudication_code_desc
Loading... (need help?)

Data¶

Finally here is the whole table from July of 2023.

In [14]:
itables.show(jul23)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id
Loading... (need help?)

March - 2023¶

Let's start looking at the number of challenges that were successful and how many were overturned. The full data table will be at the end and the original CSV is in the corresponding zip file.

In [15]:
mar23.adjudication_status.value_counts(dropna=False)
Out[15]:
adjudication_status
NaN    13
Name: count, dtype: int64

As you can see there aren't very many challenges for this time period. The maps are all zoomed in around Walla Walla. It is also unclear if these were integrated into the fabric.

Maps¶

In [16]:
mar23geo = gpd.GeoDataFrame(mar23[mar23.latitude.notna() & mar23.longitude.notna()])
mar23geo = mar23geo.set_geometry(gpd.points_from_xy(mar23geo.longitude, mar23geo.latitude), crs='EPSG:4326')
mar23geo = mar23geo.to_crs(3857)
droppedChallenges = mar23.shape[0] - mar23geo.shape[0]
print(f'Before dropping challenges without coordinates: {mar23.shape[0]}')
print(f'After dropping challenges without coordinates: {mar23geo.shape[0]}')
print(f'Challenges dropped: {droppedChallenges}')
Before dropping challenges without coordinates: 13
After dropping challenges without coordinates: 13
Challenges dropped: 0
In [17]:
ax = mar23geo.plot(figsize=(15,9))
cx.add_basemap(ax, crs=mar23geo.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image
In [18]:
mar23geo.explore()
Out[18]:
Make this Notebook Trusted to load map: File -> Trust Notebook

I don't really think we need to look to much deeper into this one

Data¶

Finally here is the whole table from March of 2023.

In [19]:
itables.show(mar23)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id
Loading... (need help?)

December 2022¶

Let's start looking at the number of challenges that were successful and how many were overturned. The full data table will be at the end and the original CSV is in the corresponding zip file.

In [20]:
print(dec22.adjudication_status.value_counts(dropna=False))
adjudication_status
Upheld        5269
Overturned    2388
NaN             17
Name: count, dtype: int64
In [21]:
dec22.pivot_table('challenge_id', index='category_code_desc', columns='adjudication_status', aggfunc="count")
Out[21]:
adjudication_status Overturned Upheld
category_code_desc
Incorrect Location Address 108 119
Incorrect Location Building Type 136 4640
Incorrect Location Unit Count 12 132
Location is Not Within Correct Building Footprint 107 19
Missing Broadband Serviceable Location 2025 359

Maps¶

In [22]:
dec22geo = gpd.GeoDataFrame(dec22[dec22.latitude.notna() & dec22.longitude.notna()])
dec22geo = dec22geo.set_geometry(gpd.points_from_xy(dec22geo.longitude, dec22geo.latitude), crs='EPSG:4326')
dec22geo = dec22geo.to_crs(3857)
droppedChallenges = dec22.shape[0] - dec22geo.shape[0]
print(f'Before dropping challenges without coordinates: {dec22.shape[0]}')
print(f'After dropping challenges without coordinates: {dec22geo.shape[0]}')
print(f'Challenges dropped: {droppedChallenges}')
Before dropping challenges without coordinates: 7674
After dropping challenges without coordinates: 2527
Challenges dropped: 5147

Here is the map of all of the challenges with data points with no geographic coordinates removed.

In [23]:
ax = dec22geo.plot(marker='.', markersize = 1, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=dec22geo.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

And lets look at the rejected challenge map first.

In [24]:
rejected = dec22geo[dec22geo.adjudication_status == 'Overturned']
ax = rejected.plot(marker='.', markersize = 1, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=rejected.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

And here is an interactive version of this map.

In [25]:
rejected.explore()
Out[25]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [26]:
itables.show(rejected)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id geometry
Loading... (need help?)
In [27]:
itables.show(rejected.category_code_desc.value_counts())
count
category_code_desc
Loading... (need help?)

And now lets look at the accepted challenges.

In [28]:
upheld = dec22geo[dec22geo.adjudication_status == 'Upheld']
ax = upheld.plot(marker='.', markersize = 5, figsize=(15,9), alpha=.5)
cx.add_basemap(ax, crs=upheld.crs.to_string(), source=cx.providers.Stamen.TonerLite)
ax = ax.axis('off')
No description has been provided for this image

Again, an interactive version.

In [29]:
upheld.explore()
Out[29]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [30]:
itables.show(upheld)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id geometry
Loading... (need help?)

Data¶

Finally here is the whole table from December of 2022.

In [31]:
itables.show(dec22)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id
Loading... (need help?)

November 2022¶

Let's start looking at the number of challenges that were successful and how many were overturned. The full data table will be at the end and the original CSV is in the corresponding zip file.

In [32]:
print(nov22.category_code_desc.value_counts(dropna=False))
category_code_desc
Missing Broadband Serviceable Location    4
Name: count, dtype: int64

Maps¶

Here is the map of the data for November. Pretty self explanitory.

In [33]:
nov22geo = gpd.GeoDataFrame(nov22[nov22.latitude.notna() & nov22.longitude.notna()])
nov22geo = nov22geo.set_geometry(gpd.points_from_xy(nov22geo.longitude, nov22geo.latitude), crs='EPSG:4326')
nov22geo = nov22geo.to_crs(3857)
droppedChallenges = nov22.shape[0] - nov22geo.shape[0]
print(f'Before dropping challenges without coordinates: {nov22.shape[0]}')
print(f'After dropping challenges without coordinates: {nov22geo.shape[0]}')
print(f'Challenges dropped: {droppedChallenges}')
Before dropping challenges without coordinates: 4
After dropping challenges without coordinates: 4
Challenges dropped: 0
In [34]:
nov22geo.explore()
Out[34]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Data¶

Finally here is the whole table from November of 2022.

In [35]:
itables.show(nov22)
challenge_id fabric_vintage category_code category_code_desc location_id location_state address_primary city state zip_code zip_code_suffix unit_count building_type_code non_bsl_code bsl_lacks_address_flag address_id latitude longitude date_received withdraw_date adjudication_status adjudication_date adjudication_code adjudication_code_desc fabric_response_location_id
Loading... (need help?)

Appendix A - Data references¶

In [36]:
adjudicationCodes = jul23[['adjudication_code', 'adjudication_code_desc']].set_index('adjudication_code').drop_duplicates().reset_index()
itables.show(adjudicationCodes)
adjudication_code adjudication_code_desc
Loading... (need help?)
In [37]:
jul23.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 181823 entries, 0 to 181822
Data columns (total 25 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   challenge_id                 181823 non-null  int64  
 1   fabric_vintage               181823 non-null  object 
 2   category_code                181823 non-null  int64  
 3   category_code_desc           181823 non-null  object 
 4   location_id                  21236 non-null   Int64  
 5   location_state               181823 non-null  object 
 6   address_primary              161014 non-null  object 
 7   city                         161026 non-null  object 
 8   state                        161026 non-null  object 
 9   zip_code                     161025 non-null  Int64  
 10  zip_code_suffix              937 non-null     object 
 11  unit_count                   160607 non-null  float64
 12  building_type_code           178623 non-null  object 
 13  non_bsl_code                 2573 non-null    object 
 14  bsl_lacks_address_flag       160784 non-null  float64
 15  address_id                   0 non-null       float64
 16  latitude                     160755 non-null  float64
 17  longitude                    160755 non-null  float64
 18  date_received                181823 non-null  object 
 19  withdraw_date                0 non-null       float64
 20  adjudication_status          181823 non-null  object 
 21  adjudication_date            181823 non-null  object 
 22  adjudication_code            181823 non-null  float64
 23  adjudication_code_desc       181823 non-null  object 
 24  fabric_response_location_id  7182 non-null    float64
dtypes: Int64(2), float64(8), int64(2), object(13)
memory usage: 35.0+ MB